The goal of this assignment is to explore more advanced techniques for constructing features that better describe objects of interest and to perform face recognition using these features. This assignment will be delivered in groups of 5 (either composed by you or randomly assigned by your TA's).
In this assignment you are a group of computer vision experts that have been invited to ECCV 2021 to do a tutorial about "Feature representations, then and now". To prepare the tutorial you are asked to participate in a kaggle competition and to release a notebook that can be easily studied by the tutorial participants. Your target audience is: (master) students who want to get a first hands-on introduction to the techniques that you apply.
This notebook is structured as follows:
Make sure that your notebook is self-contained and fully documented. Walk us through all steps of your code. Treat your notebook as a tutorial for students who need to get a first hands-on introduction to the techniques that you apply. Provide strong arguments for the design choices that you made and what insights you got from your experiments. Make use of the Group assignment forum/discussion board on Toledo if you have any questions.
The training set is many times smaller than the test set and this might strike you as odd, however, this is close to a real world scenario where your system might be put through daily use! In this session we will try to do the best we can with the data that we've got!
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
import io # Input/Output Module
import os # OS interfaces
import cv2 # OpenCV package
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from urllib import request # module for opening HTTP requests
from matplotlib import pyplot as plt # Plotting library
%matplotlib inline
#!ls kaggle
# Input data files are available in the read-only "../input/" directory
train = pd.read_csv(
'/kaggle/input/kul-h02a5a-computer-vision-ga1-2022/train_set.csv', index_col = 0)
train.index = train.index.rename('id')
test = pd.read_csv(
'/kaggle/input/kul-h02a5a-computer-vision-ga1-2022/test_set.csv', index_col = 0)
test.index = test.index.rename('id')
# read the images as numpy arrays and store in "img" column
train['img'] = [cv2.cvtColor(np.load('/kaggle/input/kul-h02a5a-computer-vision-ga1-2022/train/train_{}.npy'.format(index), allow_pickle=False), cv2.COLOR_BGR2RGB)
for index, row in train.iterrows()]
test['img'] = [cv2.cvtColor(np.load('/kaggle/input/kul-h02a5a-computer-vision-ga1-2022/test/test_{}.npy'.format(index), allow_pickle=False), cv2.COLOR_BGR2RGB)
for index, row in test.iterrows()]
train_size, test_size = len(train),len(test)
"The training set contains {} examples, the test set contains {} examples.".format(train_size, test_size)
'The training set contains 80 examples, the test set contains 1816 examples.'
#!head ./kaggle/train_set.csv
Note: this dataset is a subset of the VGG face dataset.
Let's have a look at the data columns and class distribution.
# The training set contains an identifier, name, image information and class label
train.head(1)
| name | class | img | |
|---|---|---|---|
| id | |||
| 0 | Mila_Kunis | 2 | [[[50, 31, 25], [49, 30, 24], [49, 30, 24], [4... |
# The test set only contains an identifier and corresponding image information.
test.head(1)
| img | |
|---|---|
| id | |
| 0 | [[[209, 210, 205], [208, 209, 204], [208, 209,... |
# The class distribution in the training set:
train.groupby('name').agg({'img':'count', 'class': 'max'})
| img | class | |
|---|---|---|
| name | ||
| Jesse_Eisenberg | 30 | 1 |
| Michael_Cera | 10 | 0 |
| Mila_Kunis | 30 | 2 |
| Sarah_Hyland | 10 | 0 |
Note that Jesse is assigned the classification label 1, and Mila is assigned the classification label 2. The dataset also contains 20 images of look alikes (assigned classification label 0) and the raw images.
In this example we use the HAAR feature based cascade classifiers to detect faces, then the faces are resized so that they all have the same shape. If there are multiple faces in an image, we only take the first one.
/kaggle/temp/ or ../../tmp, but they won't be saved outside of the current session
class HAARPreprocessor():
"""Preprocessing pipeline built around HAAR feature based cascade classifiers. """
def __init__(self, path, face_size):
self.face_size = face_size
file_path = os.path.join(path, "haarcascade_frontalface_default.xml")
# Checks if this file exists and if not download it
if not os.path.exists(file_path):
if not os.path.exists(path):
os.mkdir(path)
self.download_model(file_path)
self.classifier = cv2.CascadeClassifier(file_path)
def download_model(self, path):
url = "https://raw.githubusercontent.com/opencv/opencv/master/data/"\
"haarcascades/haarcascade_frontalface_default.xml"
with request.urlopen(url) as r, open(path, 'wb') as f:
f.write(r.read())
def detect_faces(self, img):
"""Detect all faces in an image."""
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
return self.classifier.detectMultiScale(
img_gray,
scaleFactor=1.2,
minNeighbors=5,
minSize=(30, 30),
flags=cv2.CASCADE_SCALE_IMAGE
)
def extract_faces(self, img):
"""Returns all faces (cropped) in an image."""
faces = self.detect_faces(img)
return [img[y:y+h, x:x+w] for (x, y, w, h) in faces]
def preprocess(self, data_row):
faces = self.extract_faces(data_row['img'])
# if no faces were found, return None
if len(faces) == 0:
nan_img = np.empty(self.face_size + (3,))
nan_img[:] = np.nan
return nan_img
# only return the first face
return cv2.resize(faces[0], self.face_size, interpolation = cv2.INTER_AREA)
def __call__(self, data):
return np.stack([self.preprocess(row) for _, row in data.iterrows()]).astype(int)
Let's define a function to plot the sequence of images
FACE_SIZE = (100, 100)
def plot_image_sequence(data, n, imgs_per_row=7):
n_rows = 1 + int(n/(imgs_per_row+1))
n_cols = min(imgs_per_row, n)
f,ax = plt.subplots(n_rows,n_cols, figsize=(10*n_cols,10*n_rows))
for i in range(n):
if n == 1:
ax.imshow(data[i])
elif n_rows > 1:
ax[int(i/imgs_per_row),int(i%imgs_per_row)].imshow(data[i])
else:
ax[int(i%n)].imshow(data[i])
plt.show()
We implement HAARPreprocessor on our training and testing data
preprocessor = HAARPreprocessor(path = '../../tmp', face_size=FACE_SIZE)
train_X, train_y = preprocessor(train), train['class'].values
test_X = preprocessor(test)
We use our plotting function to visualize our processed images
This will allow use to see if faces were extracted well from our raw images
plot_image_sequence(train_X[train_y == 0], n=20, imgs_per_row=10)
plot_image_sequence(train_X[train_y == 1], n=30, imgs_per_row=10)
plot_image_sequence(train_X[train_y == 2], n=30, imgs_per_row=10)
As we can see above, the faces extracted using HAARPreprocessor are not ideal
We can see 3 problems:
Let's create a class to plot the raw images so we can visualize them and see what is the problem
class raw_images():
def preprocess(self, data_row):
return cv2.resize(data_row['img'], (600,600), interpolation = cv2.INTER_AREA)
def __call__(self, data):
return np.stack([self.preprocess(row) for _, row in data.iterrows()]).astype(int)
preprocessor_raw = raw_images()
raw_X, raw_y = preprocessor_raw(train), train['class'].values
plot_image_sequence(raw_X[raw_y == 0], n=20, imgs_per_row=10)
plot_image_sequence(raw_X[raw_y == 1], n=30, imgs_per_row=10)
plot_image_sequence(raw_X[raw_y == 2], n=30, imgs_per_row=10)